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ABSTRACT 

The concept of confidence, the use and computation of 
confidence and tolerance limits, and a method for determining 
the significance of the difference in two or more sample means 
are discussed in an elementary manner. 


TECHNICAL MEMORANDUM No, 1113 




TABLE OF CONTENTS 

Pag® 

Acknowledgement 


iii 

Foreword 


V 

Abstract 


Vi 

List of Tables 


viii 

Introduction 


1 

PART 

I 

The Concept of Confidence 

3 

PART 

II 

Confidence Limits for a Binomial 
Distribution 

6 

PART 

in 

Standard Deviation, Normal Distribution, 
and Tolerance Limits 

23 

PART 

XV 

Confidence Limits for an Average 

33 

PART 

V 

The Significance of an Observed Difference 
in Averages for Two Series of Tests 

37 

PART 

VI 

Confidence Limits for a Standard Deviation 

49 


vii 


TECHNICAL MEMORANDUM No 0 1113 


\ 


LIST OF TABLES 


Page 


TABLE 

I 

Confidence Limits (percent) for 

Binomial Distribution oo» 

13 

TABLE 

II 

Confidence Limits (percent) for 

Binomial Distribution * o o 

14-15 

TABLE 

III 

Confidence Limits (percent) for 

Binomial Distribution * o « 

16-17 

TABLE 

IV 

Confidence Limits (percent) for 

Binomial Distribution oo« 

18-19 

TABLE 

V 

Upper or Maximum Confidence Lisiit 
(percent) For a Series of Tests 
Without a Failure o o » 

20 

TABLE 

VI 

Number of Tests to be Performed 

Without a Failure o . « 

22 

GRAPH 


Normal Frequency Curve 0 ■> ° 

27 

TABLE 

A 

Tolerance Factors for 90 Percent 
Confidence 

30 

TABLE 

B 

Tolerance Factors for 95 Percent 
Confidence <.«<> 

31 

TABLE 

C 

Tolerance Factors for 99 Percent 
Confidence <, <> 0 

32 

TABLE 


Student's t Distribution <,<>* 

36 

GRAPH 


Student ! s t Distribution, ooo 

47 

TABLE 


B Factors 

51 


viii 


TECHNICAL MEMORANDUM No. 1113 
INTRODUCTION 

"The practice sometimes followed of consulting 
the statistician only after the experiment is com- 
pleted and asking him ? what he. can make of the re- 
sults 8 cannot "be too strongly condemned. It is 
essential to have the experiment in a form suit- 
able for analysis and in general this can be at- 
tained by designing the experiment in consultation 
with the statistician or with due regard to the 
statistical principles involved." — R, C. Bowden, 

Director of Ordnance Factories (Explosives), 

Ministry of Supply, Great Britain. 

An electronic experimenter possesses a sine wave generator 
which can economically manufacture millions or even billions of 
essentially identical repetitive input signals for a circuit that 
he may be developing. He can therefore base his conclusions 
relative to the performance of this circuit with respect to input 
signal and with respect to changes in circuit components upon a 
sufficient number of observations so that the conclusions can be 
accepted without reservation. 

The developer of rocket ordnance material is not so fortunate. 

The cost in time and money of a guided missile or even a small 
component such as a fuze is so great that conclusions which may 
involve the expenditure of millions of dollars and the strengthen- 
ing or weakening of the national security must, of necessity, be 
made upon the basis of very small numbers of tests. Great strides 
have been made in the last few years in the science of statistics; 
many of these . advances have been made since the majority of 
practicing scientists and engineers received their formal professional 
training. Although it may be tacitly assumed that professional 
personnel engage in outside reading which advises them of all sig- 
nificant advances in all professional fields which may conceivably 
relate to their own, the writer questions the validity of this assump- 
tion, and has noted that professional personnel generally maintain 
their proficiency in their own specialty only and may very likely 
forget even the basic essentials in the other professions which were 
taught, in their undergraduate orientation courses . 

The advice and assistance of a professional statistician are 
the first prerequisites to the performance of a successful series 
of tests of a small number of expensive items from which important 
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conclusions must be drawn. It is the engineer, however, who must 
determine when to consult the statistician, just as it is the 
patient who must determine when to visit the physician. As assistant 
for engineering operations in the Rocket Division, the writer noted 
that many of the engineers in that and other divisions were not 
thinking or planning statistically! the statistician was a gentle- 
man who "worked upstairs somewhere" with a lot of tables which 
could be used to add lustre to the phraseology of a final report. 

The writer concluded that the valuable assistance of the pro- 
fessional statistician would not be utilized -until each engineer 
concerned with the planning or the execution of ordnance experimental 
programs incorporated a statistical attitude into his daily thinking. 
In an effort to facilitate this incorporation, the writer conducted 
a series of statistical discussions for project engineers and pre- 
pared the "first aid notes" therefor.. If the title leads any reader 
to believe that a statistician should be consulted only after diffi- 
culties have been encountered in the experimental program, this Is 
regrettable, but the writer suspects that this reader would not have 
consulted the statistician at all otherwise so the net effect may be 
beneficial, at least with respect to future programs. 

The style of this memorandum is deliberately informal and 
pedagogical. No attempt has been made to provide a comprehensive 
introduction to the field of statistics; conventional first aid 
notes are not an introduction to the field of medicine. This is 
a sample. In the culinary sense, which it is hoped will whet the 
appetite of the reader.. 
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PART I 

"We see that the theory of probabilities is 
at bottom only commonsense reduced to calculation: 
it makes us appreciate with exactitude what reason- 
able minds feel by a sort of instinct." — P. S. Laplace 

Purpose : 

To discuss the concept of confidence. 

a. The owl has gained a reputation for wisdom, which 
ornithologists say is undeserved, by maintaining an astute look 
and never making an incorrect statement. It should be noted, 
however, that he never makes ANY statements. The scientific 
experimenter finds that an astute look is not a liability, 

but his supervisor requires him to make statements and draw 
generalized conclusions on the basis of his experiments. 

b. Obviously, the experimenter -who makes definitive 
statements relative to the performance that may be expected of 
a large number of rockets, solely on the basis of the test 
results accumulated from five or ten firings, will sometimes 

be wrong in these statements . A confidence level or confidence 
coefficient is a state of mind and reflects how anxious the 
owner of that mind is to avoid being wrong in his statements 
or conclusions. 

c. A cautious, conservative person who buys safe invest- 
ments, wears a belt and suspenders, and qualifies his statements 
carefully is operating on a high confidence level. He is certain 
he won ? t be wrong very often. If he is wrong once in 100 times, 
he is operating on a 9§ percent confidence level. A less con- 
servative person who takes more chances will be wrong oftener 
and hence he operates on a lower confidence level. If he is 
wrong one time in 20 statements, he is operating on a 95 percent 
confidence level. The confidence level , therefore, merely specifies 
the percentage of the statements that a person expects to be correct. 
If the experimenter selects too high a confidence level, his test 
program becomes prohibitively expensive in time and money before 

he makes any very precise conclusions. If the confidence level 
is too low, precise conclusions are easily reached but these 
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conclusions will "be wrong too frequently, and this in turn is 
too expensive in time and money if a large quantity of the item 
is made on the "basis of erroneous conclusions. . There is no ready 
answer to this dilemma. I favor using a low confidence level 
when tests are exploratory and progressively raising the con- 
fidence level as the finished item is evaluated. This practice 
is not always applicable, however. 

d. Tables will be presented in the other parts of this 
memorandum which permit the experimenter 'bo determine what con- 
clusion he can draw for the confidence level that he selects. 

If he selects a confidence level of 95 percent, he uses the 

95 percent confidence table and the conclusions that he draws 
in accordance with this table will be in error only one time in 
twenty, on the average. 

e . Many statisticians prefer the term Confidence Coefficient 
t o Confidence Level ; the meaning is the same. 

f. Some authors prefer to consider the confidence level as 
representing the probability that a given statement is correct. 

The reader may adopt this, concept if he wishes, but I would not 
recommend it as I believe that it involves a philosophical con- 
cept which eventually hinders an understanding of the subject, 

A further discussion of this difference in viewpoints is given in 
a footnote which may be of interest to some readers. 

FOOTNOTE 

If a coin is flipped the chances are approximately one out 
of two before the coin lands that it will be heads. After it 
lands, but even before I look at it, the uncertainty has ended 
and it is either heads or tails, and it does not seem correct 
for me to then state tha.t the probability of it being heads is 
one out of two. Everyone else in the room may have already 
looked at it and knows that it is tails. If I shuffle a deck 
of cards and then ask what the chances are that the top card 
is the ace of spades, the answer is not one in 52 : to my way of 
thinking the question is meaningless. The top card is either 
the ace of spades or it isn't and no chances are involved." If 
I state that it is NOT the ace of spades, I know that X am opera- 
ting on a confidence level of 5^/52 or approximately 98 percent; 
that is, I will be right in 51 out of every 52 such statements 
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that I make, on the average. 

Another individual might prefer t.o state that there is a 
probability of 51 in 52 that the top card is not the ace of 
spades and a statement to the effect that the top card is not 
the ace of spades is therefore made with a probability of 51/52 
or approximately 93 percent of being correct. 

In the case of an experimental rocket that may be tested 
for burning distance , I would prefer to state that the true 
average burning distance is established by the rocket design 
and exists whether any rockets are ever built or fired, although 
no one knows what this distance is. If I do test a few, I can 
then make a statement relative to this distance, with appropriate 
limits, and I will be right or wrong, although I do now know if 
I ? m right or wrong. I do know that I will be right in such state- 
ments a certain known percentage of the time, say 95 percent, 
which represents my confidence level, because I got my statement 
from a 95 percent confidence table. I do not say that my 
probability of being right on a given statement is 95 percent, 
as I am either right or wrong, and I prefer not to imply that 
my ignorance of the true fact creates a probability. Some 
authors on the subject do refer to the probability of being 
right on a given statement, however, and for them the con- 
fidence level might be defined as the probability of beiiig 
right in any given statement. In this memorandum, the confidence 
level will refer to the percentage of his statements that the 
experimenter may. expect to be correct. 
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PART II 

"Mathematics may he compared to a mill of exquisite 
workmanship which grinds you stuff of any degree of 
fineness* hut nevertheless what you get out depends 
on what you put in-- and as the grandest mill in the 
world will not extract wheat flour from peaspods so 
pages of formulae will not get a definite result out 
of loose data."— T. H. Huxley 

Purpose: 

To define confidence limits . 

a. Confidence levels and confidence limits have been confused 
in the minds of many professional personnel. A level and a limit 
are not the same. Confidence limits are merely the computed -upper 
and lower limits of the desired value of a physical quantity. For 
example* the average height of the Joshua tree might he of interest. 

b. I could measure the height of a number of Joshua trees 
picked at random in a number of localities and compute very accurately 
the average height of those I had measured. Suppose this value were 
15 feet 6 inches and I had measured 20 trees. I might decide to state 
in a published report that the average height of Joshua trees in 
general was between 14 and IT feet* which would be my limits. If 
someone else said 15 and 1 6 feet* their limits would be 15 and 1 6 feet. 
Obviously the latter limits are rather close for a measurement based 
upon only 20 trees and these limits would be made by a person operat- 
ing on a lower confidence level than the person who selected the 
limits of 14 and 17 feet. 

c. In other words* the person who -sets his limits quite closely 
to his observed value will have a lower percentage of his statements 
correct than a person who is not so precise. If the above limits 
were obtained from suitable statistical tables* they would be called 
confidence limits on the average height of the Joshua tree. The 
difference between the upper and lower limit is known as the confi - 
dence interval . If a botanical experimenter had determined that 

his reputation would not be seriously affected if he were right in 
only 95 percent of his statements* he would have gotten his limits 
by means of a 95 percent confidence table and he would refer to the 
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limits so derived as 95 percent confidence . limits and the interval 
"between as a 95 percent confidence interval . He therefore states ■ 
positively that the average height of the Joshua tree is between 
these limits , but he knows that he will be wrong in 5, percent of 
such statements. 

d. Since raost tables yield central confidence intervals, he 
also knows that he will be wrong in 2 1/2 percent of his statements 
because the true value was equal to or less than his lower confidence 
limit and wrong i n £ i/2 percent of his statements because the- true 
value was equal to or greater than his upper confidence limit. 

e. It is therefore obvious that he would be operating on a 

97 l/2 percent confidence level if he had made a statement utilizing 
only one of the limits. Such a statement might be: "The average 

height of the Joshua tree is less than _ feet." Even though he 

used the same numerical value as before, he knows that 97 l/2 percent 
of such statements will be correct, as the true value will be equal 
to or greater than the limit for only 2 l/g percent of his statements-. 
If he had wished to stay on the 95 percent confidence level , and 
specify only one limit, the numerical value of the maximum limit 
could be a little lower because he can afford to be wrong in 5 percent 
of his statements because the true value was equal to or greater than 
his limit. It is generally better to retain the same confidence level 
and compute the desired limit or pair of limits accordingly. 

f. In the case of the Joshua tree, the more conventional approach 
described in paragraph c will yield more informative statements for any 
specified confidence level than the use of a single confidence limit 

as illustrated in paragraph e . As will be shown later, however, for 
certain problems in ordnance -engineering, the specification of only 
one limit may be desirable. Inasmuch as the words "upper" and "lower" 
as applied to confidence limits have long been associated with the 
specification of a confidence interval to which the confidence level 
applied, I wish to propose the terms ^ maximum confidence limit" and' 
" minimum confidence limit" to define limits which will be used in a 
singular' sense with no reference to ‘each other or to a confidence 
interval. (The maximum confidence limit and the upper confidence 
limit are therefore not synonymous for a specified confidence level. ) 

g. The technique for computing confidence limits for an average 
from observed data is given in Part IV of this memorandum, as some 
terms must be defined in Part III before this operation can be 
explained. 
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ho For a little practice in the use of confidence limits 9 
consider the typ® of tests in which only two answers are possible* 
The drop testing of light bulbs might be a cases either the bulb 
breaks or it doesn 9 t® The gauging of metal parts is another § the 
firing of detonators on a specified electrical current is a third* 
Either an anticipated event occurs or it dossn 8 t| no measured data 
are involved s the answer is yes or no« Assume that 10 point 
detonating bomb fuses are dropped and three fail to function* The 
best estimate of the failure rate of an infinite number of identical 
fuses is the on® observed^ that is s JO per cent * but what may be 
said relative to a confidence interval for the failure rate? 

1* Referring to Table IIXj, it may be seen that the 21 P er cant 
confidence interval is from 7 per cent to 6j> per cent * the lower . 
and upper confidence limits respectively^ A statement suitable for 
a report might be 3 M A sample failure rate for the subject fuse of 
30 per cent was observed for ten samples j, and the writer is p®p 

cent confident that the true failure rate lies between 2 p»p cent 
and 6j per cent o 98 Ninety»five of ©very one hundred such statements 
will be correct on the average o The specification of the number of 
fusses tested is not essential but it takes little space and enables 
a reader to check the statement o Unless the experimenter knows 
the use that is to be made of his report^ this statement employing 
a confidence interval is the most informative and should be given* 

j * If the tests in the preceding paragraph were made upon 
ten fuses picked at random from a lot that had been in storage s 
for the purpose of deciding whether to issue or rework the fuzes 9 
the man responsible for making the decision might be more interested 
in the following singular limit statement which could be mad® after 
reference to Table XX 8 ,! A sample failure rate for the subject 

fuse of 30 per cent was observed for ten samples^ and the writer 
is 25. psp cent confident that the true failure rat© is less than 
61 per cent * 88 (Since I have invented the terms "maximum" and K mini- 
mum' 1 confidence limits^ their use is not recommended' in a formal 
report^ but the recommended statement permits no ambiguity® Sixty" 
one per cent is the maximum confidence limit®) 

k* Conversely^ if the fuzes were models of an experimental 
faze, the man responsible for deciding whether to make more fuzes 
to this design might be more interested in a singular limit typ® 
of statement as follows? "A sample failure rate for the subject 
fuze of 30 per cent was observed for ten samples^ and the writer 
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is 95 per cent confident that the true failure rate will exceed 
2 o®nt if a large number of fuzes are made in accordance with 
this designo** This statement is also taken from Table 11 0 

lo In either cas@j, the use of a singular confidence limit 
permitted the statement to be a little more precis® in the field 
of interest o In other examples that might be citedj, the gain in 
precision is more appreciable „ It must be EMPHASIZED ,, however 9 
that the experimenter must select his confidence level and must 
decide whether he is interested in a confidence interval or a 
singular limits and in this case which limits before he analyzes 
his datSo If h@ favors on© technique or the other after he sees 
the dataj, he is biasing the probability involved in th® computation 
of the tables and h® will lower his overfall batting average 
below the indicated confidence level as these tables are computed 
Upon the assumption that they will be used, in a consistent manner,, 
Suppose^, for example^, that he observes no failures in ten tests o 
If he had intended to determine a maximum confidence limit only ? , 
he may say with 95 per cent confidence (Table XI) that the true 
failure rate is less than 26 per cent o If he had intended to 
specify a confidence interval^, he must say that the true failure 
rate lies in the interval of 6 per cent to per cent o H© cannot 
jump from an interval type of analysis to a singular limit type of 
analysis only when he observes no failures e 

m 0 Personally I favor the use of a singular limit technique 
for ordnance component testing as the gain in precision is consid- 
erable when no failures are observed j, and this gain in precision 
is legitimate if this analysis is also used when failures are 
observed » When failures are observed^ there is a loss in precision 
if the singular limit technique is employed but this is probably 
of no consequence as the experimenter is only interested in how 
high (or how low) the true failure rate might be or he would not 
have adopted the single limit type of analysis in the first place 0 

Ks> Tables I and IV are used in a similar manner for other 
confidence levels that the experimenter prefer Sj, although it should 
be noted that the 95 per-cent confidence level is quite popular 
with statisticians and experimenters <> 

Oo The ordnance experimenter who uses Tables I through IV 
soon discovers that he does not often have his data for series of 
tests consisting of 10 9 20 a or 50 samples <> H© may have had suffi- 
cient funds to test only 6 units for example o He has observed 
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no failures and is exhibiting a pardonable pride in this achievement. 
But what definitive statement can he make in his report? With what 
assurance can he recommend comparatively large scale manufacture of 
the item? 


p„ I have prepared Table V for the purpose of answering 
these questions. If the experimenter believes that a 95 per cent 
confidence level is desirable and if he has decided before performing 
the test that he is going to make a singular limit statement for 
the maximum limit , he makes the statement that the true failure 
rate will be less than J9 per cent for a very large number of items 
identical to those he has tested. He makes this rather unencouraging 
statement with the knowledge that he will be wrong in 5 per cent of 
such statements? Even if he were willing to operate on an 80 per 
cent confidence level (and this is not recommended), he can only 
state that the true failure rate is less than 24 per cent . 1 It is 
small wonder that many projects which appear to offer great promise 
on the basis of a few exploratory tests later lead to disappointment. 
Conversely s it should be remembered that the best estimate of the 
true failure rate is the one observed , 0 per cent in this example . 

If the experimenter has faith , which is not amenable to statistical 
analysis , in his item and his own ability to remedy such deficiencies 
as may later become apparent* he may proceed with enthusiasm and 
high emotional* if not statistical* confidence, 

q, Realizing that too few tests in a series results in a 
very broad confidence interval or a high maximum confidence limit, 
the experimenter may ask how many successful tests without a failure 
are necessary before he can claim to have met a maximum confidence 
limit specified by the Bureau or his supervisor. I have prepared 
Table VI for this purpose.' It may be used for the estimating of 
quantities of items required for test purposes. 

r. The experimenter should examine very critically any re- 
quirements stated in statistical terms that may be given him by 
higher authority. For example, the requirement of "99 per cent 
functioning at 95 per cent confidence" reads very nicely but it 
might be prohibitively costly to demonstrate for anything more 
expensive than machine gun bullets. Since the higher authority is 
interested in only how high the failure rate is, a single confidence 
limit approach is legitimate. The experimenter will have to test 


*This value was computed from formula at bottom of page 21. 
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299 items without a failure (Table VI ) in order to demonstrate that 
he has met the requirement. This is no mean feat and will not often 
be achieved unless the true failure rate is somewhat under 1 per 
cent . 


s„ If the experimenter observes 10 failures in 1000 tests, 
he has observed a functioning rate of 99 per cent, it is true, but 
he cannot say that he 'has met the requirement at the required con- 
fidence level. He can say that the true failure rate is less than 
2 per cent, yet the chances are that the originator of the requirement 
would be delighted if he knew that even 100 items had been tested 
with, only a 1 per cent observed failure rate , although the 95 per 
cent confidence statement would then be that the true failure rate 
was less than k per cent. 

t. The originator of the requirement in paragraph r probably 
hoped that the failure rate of the items tested would be less than 
one per cent. This is a legitimate hope and should be stated in 
this manner . To restate the hope as a requirement in statistical 
terms automatically shifts the emphasis from the failure rate of 
the samples to be tested to the expected maximum failure rate of 
an infinite number of items and may pose an insurmountable and 
probably unintended problem for the experimenter. The experimenter 
should immediately object to such a requirement and propose an al- 
ternate specification in precise statistical terms which he believes 
can be achieved and state his reasons therefor. He will thereby 
retain the respect of his superiors whereas he may only incur censure 
and criticism if he accepts the project without protest and then 
attempts to justify the performance of his item and lower the required 
specifications in his final report. 

u. The experimenter should not overlook the possibility of 
utilizing his test data to establish more than one conclusion. For 
example, a certain maximum confidence limit may have been specified 
with respect to a rocket motor exploding upon ignition before the 
motor in question may be used on an aircraft. The cost of the 
number of motors that must be fired to demonstrate this limit is qdite 
high, but there is no reason why these, motors may not be the same 
ones used to determine the ground launched dispersion and other 
performance characteristics of the rocket which must be measured in 
an over-all evaluation program.- The motor explosion is sufficiently 
spectacular to warrant attention by someone whenever it occurs, so 

no particular preparation for observing this phenomenon is needed. 

In other cases, however, the inclusion of an extra camera or other 
instrument may. permit the accumulation of data for a considerable 
number of items that are being tested primarily for other purposes. 
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The decrease in the confidence interval that occurs when the number 
of items tested is increased is spectacular for small numbers of 
items and warrants the extra planning involved. 

v. The tables given in this part may also be used to draw 
conclusions as to the serviceability of ammunition which has been 
in storage for a long period of time or under adverse conditions. 
For the results to be valid , it is essential that the selection of 
the items to be tested be made completely at random. 
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CONFIDENCE LIMITS (PER CENT) FOR BINOMIAL DISTRIBUTION -TABLE I 

Confidence Level— =>80 per cent for an interval s ioe 05 , a pair of 

limits » 

=>“90 per cent for a single limit 9 either maxo or 
mino 



^If percentage observed in sample exceeds 50 per eeni 9 read 
100 minus the percentage observed and subtract each confidence 
limit from 100 o 
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CONFIDENCE LIMITS — TABLE II (Continued) 



«rlf percentage observed exceeds 5° per cent, read 100 minus percent 
age observed, and subtract each confidence limit from 100 » 


15 








CONFIDENCE LIMITS (PER CERT) FOR BOTCMIAL DISTRIBUTION-- 1 TABLE III 
Confidence Level— 95 per cent for an interval,, i.e, s a pair of limits • 
—97 l/2 per cent for a single limits either max. 
or min. 

(Select level and type of limit desired before analysing data.) 



i 
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CONFIDENCE LIMITS (HER CENT) FOR BINOMIAL DISTRIBUTION - -TABLE III 

(Continued) 








SIZE 

OF SAM 

PIE 


i'lvIKnvPi-; *31 

10 

rs 5 

50 

100 


m&am 

msm 


23 50 

27 46 

33 39 

mtmU 

■ ■ 



28 47 

34 40 

Bi" 

■ ■ 


25 53 

28 48 

35 4l 

Bl- 




29 49 

36 42 

40 

iSI 

19 64 

27 55 

30 50 

37 43 





31 51 

38 44 




28 57 

32 52 

39 45 





33 53 

40 46 




30 59 

34 54 

4l 47 



23 68 


35 55 

42 48 

46 



32 61 

36 56 

_ I _ I _ 

47 




37 57 

44 50 

48 



34 63 

38 58 

45 51 

49 




39 59 

46 52 

50^ 

19 81 

2? 73 

_ 36 64 

40 60 

. 47 ... 53 

If percentage observed in sample exceeds 50 per cent P read 


100 minus percentage observed and subtract each confidence limit 
from 100, 
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TABLE XV — (Continued) 



*X£ percentage observed in sample exceeds 50 per cent* read 100 
minus the percentage observed and subtract each confidence limit 
from 100. 
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TABLE V 

UPPER OR MAXIMUM CORFXBEMCE LIMIT (PER CEHT) FOR 
A SERIES OF TESTS WITHOUT A FAILURE 


Confidence Levels for an interval type analysis 
30 %> 90 % 95% 93% 99 *% 

Confidence Levels for a singular limit analysis 
95% 97 1/2% 99% 99 l/2% 



If the last test, the tenth* for example * fails, Tables I, XI 
III, or IV must be used. It is not valid to ignore the last test 
and draw a conclusion from this table on the basis of nine tests 
without a failure* Furthermore, the esq^rimeufcer must be prepared 
to use a single confidence limit type of analysis when he observes 
failures if he had planned to use this analysis in case he did not 
observe any failures* 


S 




























TECHNICAL MEMORANDUM No. 1U3 

TABLE FI 

NUMBER OF TESTS TO BE PERFORMED WITHOUT A FAILURE IS ORDER TO 
DEMONSTRATE THE ACHIEVEMENT OF A SPECIFIED 
MAXIMUM FAILURE RATE 

Inasmuch as a specified maximum failure rate is* by the 
nature of its specification* a singular confidence limit,, the 
confidence lewis specified below refer to this type of analysis. 
The experimenter who proposes to use this table if he observes 
no failures * must also be prepared to eaqaress his results in the 
form of a maximum singular confidence limit if he does observe 
failures , 


Specified 
Maximum 
Confidence 
Limit r© 


15 

20 

25 


Confidence lewis for a singular limit analysis 


J2%95^> 


fa 99 1/2 % 


i 

230 

299 

370 

m&mm 

530 

2 

115 

149 

184 

229 

263 

3 

76 

99 

122 

152 

174 

4 

57 

Ik 

91 

113 

130 

5 

45 

59 

72 

90 

103 


15 

11 

8 


29 

19 

14 

11 


23 

17 


29 

21 


33 

24 

i§- 


For example* an experimenter who wishes to be right on 95 
per cent of his statements must perform 99 tests without a failure 
before he can say that the failure rate for an infinite number of 
the items is less than 3 per cent, (The formula is the same as 
for the single limit type analysis under Table V,) Obviously * if 
the true failure rate were 2 per cent., for example * he would most 
likely get two failures in these 99 tests. Unfortunate ly* a 
sample of 99 Fith two failures could also have come from a popu- 
lation having a failure rate in excess of 3 per cent* the 95 per 
cent maximum confidence limit from Table II being 5 per cent for 
these results. The above table is therefore useful for an experi- 
menter who believes that his failure rate is essentially zero but 
must demonstrate that it it 'less than a specified percentage. It 
is of no value to a man who believes his failure rate is just under 
the requiremen t. Even if he is correct in this belief* he should 


% i. e. Two failures are more likely than any other number. 
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plan to test approximately 1000 samples before he can. demonstrate 
this fact at 95 per cent confidence. 


PART III 


Purpose'. 

To define terns 

a. In a series of perfectly controlled tests , the same answer 
should be obtained for each test. Inability or failure to duplicate 
the item under test and to reproduce the test conditions plus inper- 
fections or lack of resolution in the measuring equipment make this 
impossible usually. 

b„ A good series of controlled tests is one in which the 
undesired variables are almost insignificant. Data might be as 
follows; 

10.01 

10.03 

10.00 

9-99 

c . A poor series of controlled tests is one in which the 
undesired variables cause the data to be less consistent. Data 
might be as follows; 

7-83 

9.01 

11.05 

9.89 

d. In order to study the effect of changing a characteristic 
of the item under test, the series of tests should be as good as 
possible, so that the change in performance, if this change exists, 
will not be hidden in a large spread of the data. 

e. Assume data was obtained as follows; 

9 

12 

10 

9 
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The average or mean for an infinite number of these tests is desired 
hut is not known. The average of the above series of four tests is 
10 and is our best estimate of the average for an infinite number of 
tests. The more tests the better the estimate , generally. In other 
words, the sample average of 10 is the best estimate of the population 
average , 

f. The goodness or badness of the series of tests may be 
measured in three ways; 

(1) The range . This is the difference between the maximum 
and the minimum* or 3 in the above example. The range gets 
larger as the number of tests increases as the probability of 
the various undesired variables combining to form a very high 
or very low reading increases with the number of tests. The 
range , therefore is not the best measure of whether the experiment 
is g°°fl or bad ., although it may be very useful * as noted in 

Par, n, 

(2) The mean deviation or aver age devi&tion<, The following 
table my be written? 

Best Estimate 


Data 

of Average 

Deviation 

9 


-1 

12 


+2 

10 

10 

0 

9 


-1 


If the signs of the deviations are neglected and these deviations 
are averaged* the result is the mean deviation . This quantity 
appears attractive* but* to quote Villars7 the distribution 
of this estimate is not amenable to simple mathematical mani- 
pulation so it is not recommended, 

(3) The standard, deviation . The table is repeated 


Data 

Best Estimate 
of Average 

Deviation 

Deviation Squared 

9 


-1 

1 

12 


+2 

4 

10 

10 

0 

0 


9 -11 

The sum of the squares of deviation is 6 . This is divided by 
3* notT4j to give 2, the best estimate of the variance . The 
exact reason for this division by one less than the number of 
tests requires a study of statistics but it may be understood 
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intuitively by noting that the observed average value 10 
which was used in the computation was only the best estimate 
of the true average and was too high or too low depending 
on how the data fell and this error in the observed average 
tended to make the deviations smaller than they should be . 
Hence , by dividing the sum of squares by one less than 
the number of tests , the best estimate of the variance 
is obtained . ( Warning . Some authors of standard texts 
ignore this refinement, but for small numbers of tests 
it becomes significant.) The standard deviation is the 
square root of the variance , so the best estimate of the 
standard deviat ion for the above data is the square root 
of 2, or 1.4. This may be expressed as the relative 
standard deviation which is a percentage of the average 
oFT75~divided by 10 equals l4 per cent. The standard 
deviation may also be called the standard error. 

g. What is the utility of the best estimate of the standard 
deviation ? Before investigating this question, let us consider 
for a moment the meaning of the standard deviation of an infinitely 
large number of items. The form of the distribution of the individual 
answers about the average value is not generally known, but the normal, 
or Gaussian distribution is generally preferred for the testing of 
commercially manufactured articles. A crude sketch of this distri- 
bution is shown on the next page. The percentages indicated thereon 
refer to the fraction of the total number of items tested which will 
fall between the indicated limits. For example, 68.27 per cent of 
all the items will have values between the average minus the standard 
deviation and the average plus the standard deviation. Similarly, 

95 . 44 per cent of all the items will have values between the average 
minus twice the standard deviation and. the average plus twice the 
standard deviation. It may be seen that 50 per cent of all the items 
will have values between the average minus about 2/3 of a standard 
deviation and the average plus 2/3 of a standard deviation. This 
50 per cent point is referred to as the probable error and has been 
used considerably in the older literature . 

■h. A rough statement suitable for memorizing is that 2/3 of 
all tests will have values lying within one standard deviation of 
the average, 95 per cent will have values lying within two standard 
deviations of the average, and practically all will lie within three 
standard deviations. The experimenter therefore looks at his best 
estimate of the standard deviation of 1.4 in the example posed in 
par. e. and his best estimate of the average of 10 and states that 
95 per cent of a large number of tests will yield answers between 
10 -2.8 and 10 + 2.8, or between 7.2 and 12.8. Upon the basis of 
this statement, a large number of the items are made , but it is found 
t: bat "95 per cent of these do not have values between 7-2 and 12.8 s . The experimenter 
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66 * 21 % 



.syx> obv/a -r/tOA/ 


s>S. <?■ % 

So 'X '— <2. 6 / 5 
30% — 2.2-&Z 

9 0 *> — • /. C*$S 

The percentages refer to the fraction of the total population which will 
have a value within the limits indicated for each percentage, i.e., 68.27 
percent of the population has values between the average + one std. devia 
tion, or 80 percent of the population has values between the average + 
(1.282 ) (std. deviation). 
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accepts this philosophically and retires to his laboratory licking 
his bruises and his performance rating. Next time he makes a similar 
statement a little hesitantly, but he makes it, and it also lands 
on him but hard. His faith in statistics has not been reinforced. 

What's wrong? 

i. The difficulty probably lies not in the assumption of the 
normal distribution of the data; but in the fact that the best estimates 
of .the average and the standard deviation will often be considerably 

in error when these best estimates are computed from a limited amount 
of data, as will become • evident when confidence limits are computed 
for these quantities in Parts IV and VI of this memorandum. 

j. Tables of tolerance factors for normal distributions have 
been computed by statisticians and may be found in reference, h. of 
the Bibliography. The use of such tables is illustrated by reference 
to Table B and the example previously discussed in this Part. Table 
B is for 95 per cent confidence level. At this level of confidence, 
we may say that at least 95 per cent of a large number of tests will 
fall between our estimated average of ten plus or minus 6.37 times 
the estimated standard deviation of 1.4. This turns out to be from 

1 to 19# which is not very definitive, to say the least, BUT IT IS 
THE BEST STATEMENT THAT CAN BE MADE AT 95 PER CENT CONFIDENCE? 

k. Let us turn to another more reasonable example . Suppose 
that 20 flares have been tested, the average burning time for these 
items has been found to be 90 °C seconds, and the hast estimate of 
the standard deviation has been found to be 4.7 seconds. What are 
the tolerance limits for 99 per cent of a large number of similar 
items? Referring to Table B, it may be said with 95 per cent con- 
fidence that at least 99 per cent of an infinite number of flares 
will have burning times between the average plus or minus ( 3 . 615 ) 

(4.7) seconds or between 73 and 107 seconds. At least 95 per cent 
of an infinite number of flares will have burning times between 9 ° 
plus or minus (2.752) (4.7) seconds or between 77 and 10 3 seconds, 
also at 95 per cent confidence. I like these tables because they 
enable the experimenter to make statements which are meaningful to 
the military and civilian authorities who have to make decisions 
relative to the probable utility of a weapon. 

l. " In many applications, the use of a singular tolerance . limit 
is justified.' In the preceding example, the experimenter may know 
that the maximum time that the flares might burn is of no interest 
but there is a keen interest in the minimum burning time. Again 1 
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referring to Table B he may say with 95 per cent confidence that 
at least 95 per cent of the flares will have a burning time of at 
least 79 seconds (90 minus 2.310 x 4.7).* 

m. Even though the reader of a report is trained in statistics, 
which most readers are not, the specification of a standard deviation 
that is in reality only a best estimate of a standard deviation 
computed from a limited number of samples will almost invariably 
lead the reader to expect better results than he will realize because 
he visualizes the expected performance on the basis of accurate, 

not estimated, averages and standard deviations. The specification 
of tolerance limits avoids this difficulty, and expresses the results 
of the experiment in a form that is understandable to anyone who 
has mastered grade school arithmetic. 

n. To go hack to the range of the observed data, which was 
defined as the lowest observed value subtracted from the highest; 

As noted, the standard deviation is a better criterion and its 
estimate should he computed. If this is not practical in the field, 
a rough estimate of the standard deviation may be computed mentally 
after each test from the following table which is worth memorizing; 

Divide range by this number 
to get rough estimate of 
standard deviation 
2 

2 l/2 

3 1/2 

4 

5 

o. In the original example, the range was 3 and there were 
4 tests in the series. Dividing the 3 by 2 gives 1.5 as a rough 
estimate of the standard deviation, whereas the computation gave 
1.4 as the best estimate. The above table also serves well in the 
evaluation of reported test results when the data is given in terms 
of the standard deviation. For example, if a report says there 
were 30 tests in the series and the average reading was 25.0 sec 
with a standard deviation of 4.0, the approximate range may he 
computed to be 16 so that it is most probable that the readings 
varied from 17 to 33. If the same data were given for a series 


*This use of the table is not absolutely rigorous. It will 
be sufficiently accurate if the experimenter adds approximately 
5 per cent to the value taken from the table. The computation 
would then be (90 minus 2.3IO x 1,05 x 4.7). 


Ho. of tests 
in series 

4 

6 

10 

15 

30 

100 
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of 10 tests, the chances are that the range was 12 and the readings 
varied from 19 to. 31 . If the tests described in the report were 
to be duplicated, information of this type would be useful in setting 
the range of measuring equipment and/or the position of cameras. 

p. There is a slick alternate method for computing the sum 
of squared deviations which gives the same answer but avoids the 
tedious computing of individual deviations. Add the individual 
rated ings in one column. Add the squares of the readings in a second 
column. Square the sum of the first column and divide by the number 
of tests. Subtract this number from the sum of the second column 
to get the sum of squared deviations. The example is worked below. 


9 

81 

40 6 

12 

144 

-400 

10 

100 

6 

9 

8l 


x40 

4\i6oo 

k5o 

WS 



This method offers great advantages if a calculating machine is 
available as discussed in reference f. of the Bibliography. 


NOTE: In using the tables on the next few pages, it is a 
conservative practice to add. approximately 5 par cent to the values 
taken from the table when a single tolerance is being computed. 
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TABIE A ~ TOIERAJCE FACTORS FOR 90 PER CENT CO'fJFIDENCE 


These factors , expressed singqlarly or in ± form and multi- 
plied by the best estimate of the standard deviation provide 
tolerances with respect to the observed average for the indicated 
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TABLE B — TOIERAHCE FACTORS FOR 95 PER CEHT CONFIDENCE 


These factors^ expressed singularly or in * form and multiplied 
by the "best estimate of the standard deviation provide tolerances 
with respect to the observed average for the indicated percentage 
of an infinite population. 


No„ items Percentage of an Infinite Population for which 
Tested a pair of Tolerances are Required for 

75% 90% 93 % 99 % 99 > 9 % 

a Single Tolerance is Required for ^ 


87*5 


99*95 





0MjO tO 



























These factors s expressed singularly or in ± form and multiplied 
by the best estimate of the standard deviation provide tolerances 
with respect to the observed average for the indicated percentage 


RSI 




„ items 
Tested 


Percentage of an Infinite Population for which 
a pair of tolerances are required for . 
75 % 90% 99% 99% 99 

a single tolerance is required for,*** 


114.363 


1.668 


9 

9 

8 


2.864 

2.841 


7 


4.307 

4.230 

4.161 


vm 


OMaJ O-J 
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PAST r f 


Purpose; 

To determine the confidence limits for an average of a series 
of tests 

a. If a series of tests is performed, the average of the 
answers is the best estimate of the average for an infinite number 
of tests. The estimate is better if there are a large number of 
tests in the series, and the estimate is better if the variance 
and ■ standard deviation are low. The experimenter knows" that “the 
average for the series is the best possible estimate of the true 
average, but it is necessary to assign confidence limits to the 
average before any statement cf how good the estimate is may be 
made, 

b. To review; 

(1) The deviation of a particular test in a series is 
the difference between the answer for that test and the 
average for the series, 

(2) The sum of squared deviations is self-explanatory, 

( 3 ) The variance is the sum of squared deviations divided 
by one less than the number of tests in the series. 

(4) The standard deviation is tha square root of the 
variance , 

c„ A problem will be proposed and solved. Assume that the 
penetration of a shaped charge is measured in inches as follows; 


Data 

Avg, Dev, 

Dev. sg. 

Variance 

Std. Dev. 

5 T) 

( +5 

25 




50 { -3 

9 



48 J 

1=2 

4 




'w 

"3T 

19 

4.36 

It 

is assumed that 

the population sampled has 

a normal, 


or Gaussian distribution in this and all succeeding parts. 
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The "best esti m ate of the true average is 50 . If we had selected 
three different samples,, a different average for the second series 
would "be obtained. If this sampling and testing in series of three 
were continued,, a whole set of averages would he obtained. Since 
these could be tabulated, we could average them, compute deviations, 
square them and find a' variance for the average , This large amount 
of testing is not required, however, as it may be shown that the 
variance for the average is merely the variance for an individual 
test divided by the number -of tests in a series. For our example, 

19 divided by 3 gives 6.33 as the variance for averages, each of 
which is computed from a series of th ree te sts. The standard 
deviation for this type of average is ^6.33 or 2.52. 

d. We may say that the true average lies between _______ 

50 ± (t) (std. dev. of average) or 50 ± t ]j s l s ^ 

t is taken from the Table of Students t Distribution. The degrees 
of freedom is the number we divided the sum of squared deviations 
by, or 2 in this example. (One less than the number of tests in 
the series.) If we require $5 per cent confidence, t « 4 . 30 . The 
confidence interval limits for the average are therefore 

50 ± ( 4 . 30 ) ( 2 . 52 ) « 50 ± 10.9 

or 39 «1 to 60.9 is the confidence interval B for the average® 

If we can afford to be right only 80 per cent of the time, t s= 

1.886 and the, limits for the average are therefore 

50 + (1.886) ( 2 . 52 ) * 50 ± 4.8 

or 45.2 to 54.8 is the confidence interval ,, for the average » 

e. It is interesting to note that the confidence interval 
for 95 per cent confidence covers a zone that is somewhat greater 
than. the total spread of the three observations. A study of the 
t table shows that t decreases very rapidly for the first ten 
tests in a series but the next ten are of much less value in this 
respect. 


f. If several series of tests have been performed, each with 
some characteristic of the item under test slightly different, there 
is a means whereby the number of degrees of freedom may be increased 
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and a smaller confidence interval obtained for the average for each 
series. This technique is described in Part V, paragraph e, and 
should be utilized whenever applicable . 

g, If the experimenter is only interested in how high or 

how low the true average may be, he can make a more precise statement 
employing only a single confidence limit and still retain his 
confidence level. In the example above, the maximum confidence 
li mi t for the time value of the average is 

50 + (2.92)(2.52) = 57-4 

The formal statement is therefore; "The average value for three 
tests was 50 inches and the writer is confident (95 per cent) 
that the true average for an infinite number of items made to 
identical specifications will be less than 57*4 inches." 

h. A similar but separate statement could have been made to 
the effect that the true value of the average would be greater 
than 42.6 inches, but the statements cannot be combined into a 
single statement without lowering the confidence to a 90 per cent 
level. 


i. The t table has been appropriately labeled for the 
determination of both a confidence interval and a singular confi- 
dence limit. The experimenter may use either an interval statement 
or a singular confidence limit statement, as he sees fit, for 
each series of tests. 

j. Data of this type will not fit a normal distribution 
exactly as the normal distribution extends from -ooto +©o 

and our data cannot be less than zero. If the average is of the 
same order of magnitude as the standard deviation of the average, 
the confidence interval may extend into the negative region, an 
obvious impossibility. A statistician should be consulted as 
there are many other types of distributions, and he can probably 
fit the data to one of these. 

k. Although the stating of an average for a series of tests 
and the indication of a confidence interval for that average is 
standard practice, I believe that the tolerance limits discussed 
in par. j of Part in are easier to compute and provide the reader 
with a more usable type of information. 
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PART V 


Purpose : 

To determine whether a deliberate change in the item under 
test changes its performance. 

a. If a series of tests were perfect, and the answers iden- 
tical, even the slightest change in the item would be immediately 
apparent. Tests are not perfect and it is therefore more difficult 
to tell if an apparent change in performance is caused by the 
deliberate change or the inherent perversity of the experiment. 

The "Student’s t Test" is often of assistance and its use will 

be explored . 

b . To review: 

(1) The deviation of a particular test in a series 

is the difference between the answer for that test and 
the average for the series. 

(2) The sum of squared deviations is self-explanatory. 

( 3 ) The variance is the stun of squared deviations 
divided by one less than the number of tests in the 
series . 

(4) The standard deviation is the square root of the 
variance , 

c. Assume the penetration of a shaped charge is measured 
in inches as follows for a series of three tests with a certain 
liner. The composition of the liner is changed from Alloy A to 
Alloy B and the series is repeated. 

A B 



55 


55 


47 


64 


48 


64 

Total 

150 

Total , 

~T53 

Average 

50 

Average 

61 


It appears that Alloy B is better, but maybe the data would have 
spread out that much if the first series of tests had consisted 
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of six tests and the liner wasn’t changed. The liners, being 
different, are referred to as treatments . 

d. The problem will be worked and explained simultaneously. 
(The problem and the solution were taken partially from Dr. 
Villar’s notes for a forthcoming book on small sample statistics 
which has since been published and is listed in the Bibliography.) 
We know that the best estimate of the average for Treatment A is 
50 and for Treatment B is 6l. The difference, which shall be 
designated x, is 11. If we were to run a number of identical 
series of tests, each series to consist of three tests, we know 
that the individual averages (each a best estimate for its series) 
would not be identical, although these averages would group closer 
than individual answers. Our question resolves itself into an 
inquiry as to whether a difference in averages of 11 is worth 
getting excited over. 

e. First, the perversity of the test must be considered. 

This is best measured by computing best estimates for the variance 
and standard deviation. This poses a problem. The results cannot 
be combined about a gross average of 55 l/2 as we suspect the two 
series are not identical. Neither can a very accurate estimate 

be made for a series of only three tests. However, our change 
in liner should not have affected the. degree of perversity of the 
test so the two series can be combined to advantage as follows: 


Data Average 

Deviation 

Deviation 

55l 

f+5 

25 

m 50 

f- 3 

9 

48 J 

l “ 2 

4 

551 

r-6 

36 

64 V 61 

<+3 

9 

64 J 

1+3 

9 


Sum of Squared Deviations 92 


NOTE: It is nqt essential that both series consist of an equal 
number of tests. 

f . For one series of tests, the Siam of squared deviations 
was divided by one less than the number of tests in the series 
to get the best estimate of the variance . With two series, we 
divide by a number which is the sum of two numbers. Each number 
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is the number that would have been, used for its particular series 
alone. In this case, two would have been used for each series so 
two plus two equals four and the sum of squared deviations is divided 
by four. If the second series had had four tests, the number for 
this series would have been three, and two plus three equals five. 

(if the data had been taken from three series of three tests each 
with three different liners, the number would have been two plus 
two plus two equals six, but that may be skipped for the moment.) 

g. Dividing the .sum of squared deviations of 92 by h gives 

23 which is the best estimate of the variance for the type of exper- 
iment we are- performing. The best estimate of the standard deviation 
is the square root of 23 , or 4.8 . 

h. To return, to the variance o f 23 . If the variance for 
each Individual test is known, the variance for the difference 
between averages of two different series of tests may be estimated.. 

Variance for the 

difference in averages 


where m and n are the respective numbers of tests in each series. 

i. The square root of this variance of 15-33 is 3 >92, which 
is the best estimate of the standard deviation of the difference 
in averages of different identical series of tests with three 
tests per series of the type under discussion. 


Variance for 
each test 


M 


23 


3 3 


15.33 


j. If we define 


computed t 


x 


Std. dev. for diff. in avgs. 


11 


3V92 


2.81 


we may refer to the t graph at the end of this part. This graph 
is a plot of the t table at the end of Part IV. Either may be 
usedj the graph obviates the need for interpolation, but the table 
is more precise. The degrees of freedom is a term that goes with 
our estimate of the variance for each test and is the number we 
divided the sum of squared deviations by to get this variance. 

For our example, we have, four degrees of freedom and we use the 
curve corresponding to four . We find the computed value of t, 
namely 2 . 81 , on the vertical axis, move across to the curve. 
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and read the abscissa of this point on the horizontal scale at the 
bottom. For our example , the value is slightly less than O.Og . 
This is the probability of our getting a difference of JL1 or more 
in two averages of three tests each if the alloy of the liners had 
not been changed. Stated another way, in only five times in a 
hundred, on the average, would we observe. such a great difference 
in averages if there were no difference between Alloy A and Alloy 
B. 


k. Therefore it seems reasonable to state that there is a 
difference between Alloy A and Alloy B at 95 per cent confidence . 

We can expect to be wrong in only 5 per cent of such statements. 

To simplify the use of the t graph, the appropriate confidence level 
values have been filled in along the top of the graph. The experi- 
menter computes his value of t, enters the graph using the appro- 
priate degrees of freedom curve, and determines the confidence level 
at which he can proclaim that one treatment surpasses the other. 

l. A rose is a rose, etc., according to Stein, and in sta- 
tistics significance has significance. To the uninitiated experi- 
menter a significant difference in two treatments, such as the 
liners in the shaped charges, is a worthwhile difference in terms 
of performance or economic, saving or some other criterion. Hot so 
in the phraseology of some statisticians, A significant change 

is any difference which can be proclaimed at 95, per cent • confidence . 
For our example, our test demonstrates that there is a significant 
difference in the two alloys. If our experiment were less perverse, 
or if we could have afforded more items in each series, our experiment 
would have been more sensitive and might have been able to claim a 
significant difference even though the observed difference in the 
performance of the two alloys was actually less, in terms of inches 
of penetration. 

m. If the computed t were greater than L.60, the superiority 
of one liner with respect to the other would have been proclaimed 
at 99 per cent confidence and this would have been defined statis- 
tically as a highly significant difference . Likewise , if the 
computed t were greater than 8.6l, the difference would be proclaimed 
at 99 « 9 per cent confidence, and the difference would be considered 
as ver y highly significant * This rather arbitrary phraseology is 
not objectionable if the reader of a technial report understands 
what these terms mean. For example, if the computed t had equaled 
2.20, some statisticians might say that "no significant difference 
was demonstrated." I might say that, "Alloy B is better than Alloy 
A, at better than $0 per cent confidence." The statements appear 
contradictory, but they are not because the statistician prefers 
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not to commit himself unless he can be correct in at least 95 per 
cent of his statements. His statement that no significant difference 
was demonstrated is Ms my of declining to express an opinion,, but it 
may be misleading to a reader who is not versed in the phraseology. 

For this reason, I do not recommend using the terms significant , 
highly significant and very highly significant as synonyms for certain 
degrees of confidence. State the apparent. direction of the superi- 
ority, and indicate. the confidence level. 

n. For a little practice in applying the Student t test, 
another example Is proposed as follows: 


Series A 


Series B 


71 

67 

33 Average 58 .4 

79 

42 


67 

86 

Average 76.5 


Was the difference in averages of 18.1 a reliable indication of 
the superiority of Treatment B over Treatment A? 


Data 



67^1 

86j 


Average Deviation 

Deviation 


/+12.6 

158.76 


J +8.6 

73.96 

58.4 < 

[-25.4 

64-5.16 


+20.6 

424.36 


L-16.4 

268.96 

76.5 

f9.5 

90.25 

1+9.5 

90.25 


Squared 


1,751.70 


Divide by five (four plus one) to get a variance of 350.34 for 
each test . The variance for the difference in the averages equals 
350.34(1/5 + 1/27 which equals 245 . 238 . The standard deviation 
for the differences is the square root of this or 15.7 . The dif- 
ference in averages is 18.1. Computed t » 18.1 = 1.15 for 5 degrees 

15.7 





of freedom,. This corresponds to a significance level of 0.3, that 
is, there is a probability of 0.3 of getting this great a difference 
in the two averages if there were no difference in the two treatments. 
We can conclude that Treatment B is better but we can expect to be 
right in only 70 per cent of such statements. More tests are needed 
when the data scatters this badly. 

o, To return to the example in par. c of this part, suppose 
the experimenter works on a 99 per cent confidence level. What 
difference in the two averages must exist for him to make a positive 
statement? 

Tabular t * Min x 

std. dev. for diff. in avg. 

4.604 = Min x 
3.92 

Min x =B 18.05 

If the difference in averages is less than 18,05 , a 99 per cent- ■ 
confidence man will not state definitely that Alloy B is better 
than Alloy A. He may perform more tests per series. This will 
give him a better estimate of the variance per test and more degrees 
of freedom so a smaller difference may be "highly significant." 

To illustrate this, the experimenter wants to know how many tests 
per series he must perform before the present difference in average 
of 11 (assuming this difference does not change appreciably) may 
be considered' "highly significant." 1 

Tabular t = 4.604 = 11 

std7~dev. f or"diff7’"in”avg . (Max. 7 

Maximum standard deviation for difference in average = 2.39, 
variance for this standard deviation = 5*72. 

5.72 - 23 (£ + I) 

h *= 8+, or 9 tests per series . 

This would appear to be conservative because he will then have 
16 degrees of freedom and the tabular t will be lower, actually 2.921, 
A person could assume that the variance estimate of 2§ and the 
average difference of 11. will not change and solve by trial and error 
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for "the minimum number of tests per series which will indicate at 
the 99 per cent confidence level that there is a difference in the 
two treatments . Unfortunately, this technique will reduee to 50 
per cent the probability that a real difference of 11 inches in 
the two treatments will be detected at the 99 per cent confidence 
level. 

Actually, the- experimenter may have to perform more , not less 
than 9 tests per series to provide a high assurance that he will 
detect the required difference if it does exist. The solution of 
this problem of determining the number of tests per series is not 
elementary, and I would recommend that the reader consult a statis- 
tician, or if this is impracticable , an intensive study of pages 
17 to 2 6 of reference k of the Bibliography will be of value. It 
is, also suggested that the experimenter should determine from his 
knowledge of the application of the item just what improvement he 
considers worthwhile detecting. In our example, it may be that an 
improvement of 5 inches in the average penetration would be worth- 
while! this will be difficult to establish: if ' an improvement must 
consist of 20 inches before it is of interest, ' this can be easily 
detected. The choice of 11 Inches merely because it was observed 
in the initial tests is not a logical method of choosing the dif- 
ference in performance that is of interest. 

p. If the averages do not differ by a "significant" amount, 
the two series of tests are often lumped together by statisticians 
if this is desirable. 

q. . To return to the equation for the variance of the difference 
in averages; . 

/ 1 1 

Variance for diff. in avg, = var. per test / — + — 

i m n 

It is desirable to have m = n for the most efficient comparison 
of two series. The expression /l + 1. ^should be small so that the 

lm nj 

computed t will be large. For a given total number of tests 
will be least when m - n. 

r. In some cases m * n is not practical but the formulae are 
still applicable. The degree of freedom is equal to m - 1 plus 
n - 1. 


l+l\ 
m nj 
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s. If one series of tests consists of one test only, the 

f JL + l\ tern reduces to/l + l\ and the variance per test must be 
m ni \ n / 

taken from the series of. n tests only and the degrees of freedom is 
therefore n-1. 

t. If one series of tests consists of n tests and the other 
value is assumed to be precise and representative of a very large 
number, or is a requirement, the / 1 + 1 \ term reduces to 1 and 

\ m n / n 

the variance per test must be taken from the series of n tests only. 
Therefore the degrees of freedom is n-1. 

u, There is another approach that may be utilized to indicate 
the difference between two treatments, such as the use of Alloy A 
and AllO^B in the shaped charge liners, in the example in par. c. 
of this part. The best estimate of the difference in the performance 
of heads with the two types of liners is 11 inches. 'What are the 
confidence limits on this difference in performance of II inches? 

As noted in par. i. of this part, the best estimate of the standard 
deviation of the difference in these averages is 3-92. For four 
degrees of freedom and 80 per cent confidence, the tabular t = 

I. 533 . The confidence limits for the difference are therefore 

11+ (3.92) (1.533) » 11 ± 6.01 

The lower an d upper 80 per cent confidence limits are therefore 
5.0 and 17.0 for the difference in performance of the two liners. 

If 95 p er cent confidence is required, the tabular t ® 2.776 
and the limits are 

11 ± (3.92) (2.776) - 11 + 10.9 

The lower and upper 95 per. cent confidence limits are therefore 
0.1 and 21.9 . In some respects, this method of comparing two 
treatments may be more informative than merely saying that there 
is a difference at a specified confidence level. 

v. It was noted in par. m of this part that a 99 per cent- 
confidence man would refrain from making any statement about the 
superiority of Alloy B over Alloy A on the grounds that the data 
was not sufficient to convince him. What will happen if we use the 
technique of par. u at 99 per cent* confidence level? The tabular t 

4k 
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equals 4.604 so the limits are 

11 ± (3.92) (4.604) = 11 ± 18.1 

The upper confidence limit is now 29. lj the lower confidence limit 
is -7.1! Small wonder that a 99 per cent-confidence roan refuses 
to commit himself. He could say, "Yes, I will state that Alloy B 
is superior to Alloy A by a margin of from minus 7-1 inches to plus 
29.1 inches." Such an oblique statement is hardly suitable for a 
formal report, however, so it is preferable to make a more defi- 
nitive statement at a lower level of confidence. The experimenter 
should always indicate his confidence level with his statements. 

w. Returning to Part 17 in which the technique of determining 
confidence limits for the average of a series of tests was discussed, 
it may be shown that a better estimate of the variance per test can 
be obtained if several series of tests have been performed, even 
though a different treatment was employed for each series. The 
perversity of the experiment should remain the same for each series 
even though the liners are made of different alloys, and it is the 
perversity of the experiment, as expressed hy the variance, that 
leads to differences between sample averages and the true average 
for a large number of tests. Therefore, if the series of tests 
proposed in Part 17 were the first series of tests of the two 
proposed in par. c. of this part, the 95 per cent confidence limits 
for the average of the first series may be recomputed to be 

50 ± 2.776 /|T= 50 + 7-7 
or t 0 57»7 

It is of interest to note that although the best estimate of the 
variance is a larger number (23 compared, to 19 ), this is more than 
counterbalanced by the additional certainty obtained by more tests 
and hence more degrees of freedom. 

x. "Why didn*t he mention the use of a singular confidence 
limit in par. u? I’m only interested in the minimum (or maximum ) 
amount by which Alloy B may be superior to Alloy A, so why should 

•I bother with an interval technique?" Okay, you can use a singular 
confidence limit technique, IF and only IF you are really interested 
in the superiority of Alloy B over Alloy A. BUT you must know that 
you are interested in this BEFORE you analyze the data. You cannot 
look at the data to $ee which Alloy came out ahead and then decide 
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which one you are interested in. If you wish to utilize the results 
of the test to decide which treatment is superior, then you must 
use an interval technique as described in par. u. 

y. Look at it this way. Suppose the experimenter decides in 
advance that he will subtract the second average from the first 
average, regardless of how the results come out. In our example, 
the difference would then be -11 inches. The lower 95 per cent 
confidence limit would be -21.9 inches, and the upper 95 per cent 
confidence limit be -0.1 inches^ the 95 per cent interval would 
extend from -21.9 inches to -0.1 inches, (i suppose the report would 
say that Alloy A was superior to Alloy B by minus 11 inches! ) But 
suppose that Alloy A had an average of 11 inches greater than that 
for Alloy B? The lower limit is now +0.1 inches and up per limit 

is +21.9 inches. Note how the absolute arical values of the 
upper and lower limits have switched on us, depending upon which 
Alloy came out ahead.- Which limit is the upper and which is the 
lower depends upon which treatment comes out ahead, and this de- 
pendency makes it impossible to use the singular confidence limit 
type of analysis. 

z. Let us assume that the experimenter knew in advance that 
he was only interested in the possible superiority of Alloy B over 
Alloy A, since Alloy B is much more expensive and more difficult 
to fabricate. He could then decide in advance to subtract the 
average for Alloy A from Alloy B, regardless of how the results 
came out. In our example, his difference would be a plus 11 inches. 
After he has computed a value of t in the usual manner, he can 
compare it with the tabular t values taken from the singular limit 
confidence levels shown in the table. He can then state the level 
at which he is confident that Alloy B is superior to Alloy A. 

(if the difference came out negative, or the confidence level is 
less than 50 per cent, he will state that the superiority of 
Alloy B was not demonstrated, but he cannot make any statement 
relative to the possible superiority of Alloy A.) He can compute 
a minimum or maximum limit (but not both) relative to the superi- 
ority of Alloy B over Alloy A. Typical statements for our example 
might be; "The writer is 95 per cent confident that Alloy B is 
superior to Alloy A by at least 2.6 inches of penetration," 1 or 
"The writer is 95 per cent confident that the superiority of Alloy 
B over Alloy A will not exceed 19.4 inches." 2 

1 11 - 3,92 x 2.132 

2 11 + 3.92 x 2.132 
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aa. If a manufacturing facility proposes the relaxation of 
a fabricating tolerance in order to reduce the percentage of rejects , 
or the substitution of propellant material of inferior physical char- 
acteristics, the experimenter is not interested in whether the 
proposed change will yield better results; in this example he knows 
that it won* t . He is only interested if the performance will be 
impaired. He agrees to subtract the average performance for several 
items utilizing the proposed change from the average' performance 
for the items utilizing the original design and then performs a 
singular limit type analysis. He would probably solve for- a maximum 
confidence limit and say* "The writer is 95 per cent confident that 
the decrease in the mean effective pressure caused by the utilization 
of the inferior propellant as- recommended by the loading plant will 
not exceed ______ pounds per square inch." It is interesting to note 

that even if the difference in averages turns out to be zero or 
slightly negative, the maximum confidence limit will still exist 
and may still be positive, since the maximum confidence limit is 
the observed difference plus the tabular t times the standard devi- ■ 
ation for the difference in the averages; 

bb. To summarize, if the determination of the superiority or 
inferiority (but not both) of one specified treatment with respect 
to another is the sole problem, a singular limit type analysis is 
more : informative, and should be employed. If the determination of 
which treatment is superior is the problem, the interval type analysis 
discussed in paragraphs a through w should be employed. 
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PART VI 

Purpose: 

To determine the confidence limits for the standard deviation 
of a series of tests. 

a. After the experimenter computes a best estimate of the 
standard deviation of a series of tests, he may ask himself how good 
this estimate is. In order to answer this question, he may wish 

to compute the confidence interval or a singular confidence limit 
for the standard deviation . Generally speaking he will he interested 
in only the maximum confidence limit, although in the ease of a 
weapon which depends upon dispersion for its effectiveness, such as 
a shotgun or a "barrage technique, he may wish to compute a confidence 
interval or even a minimum confidence limit. 

b. At the end of the Part is a table of B factors 1 which are 
to be multiplied by the standard deviation . If the experimenter 
wishes to know the maximum confidence limit for the standard devi- 
ation, he selects his confidence level and enters the table on the 
line opposite the appropriate degree of freedom and selects the 

B2 factor, which he then multiplies by the standard deviation to 
pbtain the maximum confidence limit for his standard deviation. 

He can then state that the true standard deviation for an infinite 
number of items will he less than this value. If he were interested 
in a minimum confidence limit he would have selected the B1 factor. 

c. If the experimenter were interested in a confidence interval 
he would have utilized both the factors to obtain the upper and lower 
limits for his interval but he would have entered the table in a 
different column for the same confidence level. 

d. The number of degrees of freedom is one less than the number 
of items tested in a single series of tests. Using the example from 
Part IV, the best estimate of the standard deviation was found to be 
the square root of 19, or 4.36. Three items were tested so there 
are two degrees of freedom. At 95 per cent confidence , the experi- 
menter can say that the standard deviation is less than 19.2 (4.406 

x 4.36 ). If he were interested in a confidence interval 


"This table was prepared by Eleanor G. Crow, who has simplified 
the classical technique by performing the square root of the quotient 
of the degrees of freedom and the value of x 2 . 
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he could say at 95 per cent confidence that the standard deviation 
is between 2,27 and 27.4 ( 0,521 x 4.36 and 6.285 x 4.36 ). 

e. The interval computed above is quite broad and even the 
maximum confidence limit is unencouragingly high. This serves to 
emphasize the fact that a good estimate of the standard deviation 
can not be obtained with only three tests in a series. The statements 
become much more precise as the number of tests in the series is 
increased. If the experimenter had performed the two series of 
tests as described in Part V, he has an estimate of the standard 
deviation of 4.80 and four degrees of freedom to work with. At 95 
per cent confidence, he obtains a maximum confidence limit of 11.4 
or a confidence interval of from 2.87 to 13.8 for the standard 
deviation. A general rule for the computation of the number of 
degrees of freedom is to subtract the number of different series 
of tests (in this case two series) from the grand total of the number 
of items tested (in this case six items). The justification for 
the apparent shenanigan of combining two series of tests which are 
considered non-identical lies in the assumption that the perversity, 
or tendency of the data to spread, is the same for each series of 
tests, since only the alloy of the liner was changed, and this should 
not affect the degree of nonrepoducibility of a series of tests. 

The standard deviation is a measure of the perversity of an experiment 
so it and its confidence limits can better be computed from both 
series of tests. 
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